Query Labelling for Indic Languages using a hybrid approach
نویسندگان
چکیده
With a boom in the internet, social media text has been increasing day by day. Much of the user generated content on internet is written in a very informal way. Usually people tend to write text on social media using indigenous script. To understand a script different from ours is a difficult task. Moreover, nowadays queries received by the search engines are large number of transliterated text. Hence providing a common platform to deal with the problem of transliterated text becomes really important. This paper presents our approach to handle labeling of queries as part of the FIRE2015 shared task on Mixed-Script Information Retrieval. Tokens in the query are labeled on basis of a hybrid approach which involves rule based and machine learning techniques. Each annotation has been dealt separately but sequentially.
منابع مشابه
A Text Input Scheme for Indic Languages with Large Numbers of Print- able Characters
This paper discusses design and development of a text-input scheme for phonetic Brahmic languages with a large number of printable characters. We devise an input scheme for an exemplar Indic language with the understanding that the findings are generalizable to other Indic languages. Our results show that a casual user is able to type at a reasonable speed with our approach.
متن کاملSangam: A Perso-Arabic to Indic Script Machine Transliteration Model
Indian sub-continent is one of those unique parts of the world where single languages are written in different scripts. This is the case for example with Punjabi, written in Indian East Punjab in Gurmukhi script (a Left to Right script based on Devnagri) and in Pakistani West Punjab, it is written in Shahmukhi (a Right to Left script based on Perso-Arabic). This is also the case with other lang...
متن کاملTowards Accurate Handwritten Word Recognition for Hindi and Bangla
Building accurate lexicon free handwritten text recognizers for Indic languages is a challenging task, mostly due to the inherent complexities in Indic scripts in addition to the cursive nature of handwriting. In this work, we demonstrate an end-to-end trainable CNN-RNN hybrid architecture which takes inspirations from recent advances of using residual blocks for training convolutional layers, ...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملانتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015